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Amendmenis TO the Claims: 

Please ad4 new Claims 31-33 as shown in the below claim lisung. The Claim Listing 
below will replace all prior versions of the claims in the application; 


Claim Listing; 


1 , (Previously Presented) A method for analysing a subject genome sequence comprising 
the steps of; 

accessing a set of Known biological fragments, the set being of a fixed number of 
said known biological ftagraents, each Icnown biological fragraem in the set having a 
respective repiesentauon; 

comparing The respective representation of each known biological fragment from 
the set to a subject genome sequence, for each known biological fragment said comparing 
including (i) counting the number of times the respective representation of the known 
biological fragment is fovmd in the subject genome sequence and (iij from said coimted 
number of times, forming a vector element, such that for each known biological fragment 
there is a respective vector element representing the number of times the respective 
representation of that known biological fragment is found in the subject genome 
sequence; 

from the formed vector elements, forming a vector having a length equal to the 
fixed number of known biological fragments in the provided set, such that the formed 
vector provides a uniform representation of the subject genome sequence; and 

providing the formed vector for use as input to a desired analysis, the imiform 
representation provided by the formed vector enabling the formed vector to serve as 
normalized input; and 

using the uniform representation in the desired armlysis, analyzing the subject 
genome sequence including one of classifying, clustering or indexing the subject genome 
sequence. 
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2. (Original) A meihod as claimed in Claim 1 wherein the set of known biological 
fragments is from published databases of motifs or proteins- 

3. (Previously Presented) A method as claimed in Claim 1 further comprising the step of: 

for each desired subject genome sequence, using said set of known biological 
fragments, repeating the comparing and forming steps such that a respective vector 
representation is formed and each desired subject genome sequence has a respective 
vector representation of a same lengihp said set of known biological fragments being a 
same set used for all of said subject genome sequences. 

4. (Original) A method as claimed in Claim 3 wherein for e^ch subject genome sequence, 
having formed respective vector representations each of the same length, using the same 
length vector representation as input into one or more sequence analyses. 

5. (Original) A method as claimed in Claim 4 wherein the sequence analyses include one of 
indexing, classification and clustering. 

6. (Canceled) 

7. (Original) A method as claimed in Claim 1 wherein the subject genome sequence is a 
DNA sequence or subsequence, 

8- (Original) A method as claimed in Claim 1 wherein the counting includes determining 
probability of the subject genome sequence being generated by the known biological 
fragment. 

9. (Original) A method as claimed in Claim 8 wherein the couj^ting determining probability 
employs a 0th order Markov model for each known biological fragment. 
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1 0, (Previously Presented) Apparatus for analyzing genome sequences, coraprising: 

a data store of representations of a predefined number of known biological 
sequences; 

a comparison routine executed by a digiial processor having access to the data 
store, the comparison routine comparing each known biological seqi^nce from the data 
store to a subject genome sequence aixd generating a score indicative of the comparison, 
said scores forming a vector having a length equal to the predefined number of known 
biological sequences, such that said comparison routine provides the formed vector as a 
uniform representation of the subject genome sequence and the formed vector enables at 
least one analysis of the subject genome sequence, the uniform representation of the 
formed vector providing normalized input for the analysis, 

wherein the generated score Is one of a probability of the subject genome sequence being 
genemted by the known biological sequence or a counting of a number of occurrences of 
the known biological sequence foimd in the subject genome sequence. 

1 1 , (Original) Apparatus as claimed in Claim 10 wherein the data store is a published 
database of motifs or proteins, 

12, (Previously Presented) Apparatus as claimed in Claim 10 further comprising a plurality 
of different subject genome sequences; and 

wherein, using a same set of known biological sequences, the comparison routine 
forms, for each subject genome sequence, a respective vector such that a corresponding 
plurality of uniform length vector representations is provided. 

13, (Previously Presented) Apparatus as claimed in Claim 12 wherein the output of the 
comparison routine feeds the corresponding plurality of uniform length vector 
representations into further analysis processors. 

14- (Original) Apparatus as claimed in Claim 13 wherein the further analysis processors 
include at least one of a classifier, an indexer and a clustering member, 


PAGE 6(20 ' RCVD AT 612412004 1 :29:22 PM [Eastern Daylight Time] ' SVR:USPT0€FXRF-1« * DNIS:8729306 * CSID:19/8 341 0136 ' DURATION (mm-ss):0546 


Jun-24-04 01 :30piii 
09/724,269 


Froffl-HBSAR 


1978-341-0136 T-004 P 07/20 F-315 


-5^ 

15. (Cwceled) 

16. (Original) Apparatus as claimed in Cl^m 10 wherein the subject genome sequence is a 
DNA sequence or subsequence. 

17-18- (Canceled) 

19. (Previously Presemed) A method for analyzing a subject protein sequence comprising the 
steps of: 

(a) providing a set of known biological fragments, the set being of a fixed 
number of said known biological fragments, each known biological fragment in The set 
having a respective representation; 

(b) comparing the respective representation of each known biological 
fragment from the set to a subject protein sequence, for each known biological fragment 
said comparing including (i) counting the number of times the known biological fragment 
is found in the subject protein sequence and (ii) from said counted number of times, 
forming a vector element, such that for each known biological fragment there is a 
respective vector element representing the number of times the respective representation 
of that known biological fragment is found in the subject protein sequence; 

(c) from The formed vector elements, forming a vector having a length equal 
to the fixed number of knovm biological fragments in the provided set, such that the 
formed vector provides a uuiform representation of the subject protein sequence; and 

(d) providing the vector for making at least one analysis of the subject protein 
sequence, the uniform representation of the vector providing normalized input for the 
analysis; and 

(e) analyzing the subject protein sequence by the analysis using the uniform 
representation as input and classifying the subject protein sequencre into a structural or a 
functiona} class. 
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20. (Previously Presented) Apparatus for analyzing protein sequences, comprising: 

a data store of a predefined number of known biological sequences; and 
a comparison routine executed by a digital processor having access to the data 
store, the comparison routine comparing each known biological sequence ftom the data 
store to a subject protein sequence and generating a score indicative of the comparison, 
said scores forming a vecior having a length equal to the predefined number of known 
biological sequences, such that said comparison routine provides the formed vector as a 
uniform representation of the subject protein sequence for at least one analysis, the 
uniforra representation of the formed vector providing normalized input for the analysis, 
wherein the generated score is one of a probability of the subject protein sequence 
being generated by the known biological sequence or a counting of a number of 
occurrences of the known biological sequence foxmd in the subject protein sequence, 

21. (Previously Presented) A method of analyxing a subject genome sequence comprising: 

(a) providing a predefined set of known biological firagments, the set being of 
a fixed number of said known biological fragments, each known 
biological firagmeni in the set having a respective representation; 

(b) providing a subject genome sequence; 

(c) quantitatively determining a score of each known biological firagment in 
the set with respect to the subject genome sequence; 

(d) forming a feature vector of the subject genome sequence, said feature 
vector being a sequence of scores of each known biological fragment in 
the set, 

(e) using the featiure vector, analyzing the subject genome sequence thereby 
producing classification, clustering or indexing of the subject genome 
sequence. 

22. (Previously Presented) The method of Claim 21 wherein the respective representation of 
each known biological fragment is a text string- 
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23. (Previously Presenied) The method of Claim 22 wherein quaniitaiively deiermining a 
score of each known biological fragment in the set includes for each known biological 
fragment, counting the number of times the text siring of the respective representation is 
found within the subject genome sequence. 


24. (Previously Presented) The method of Claim 21 wherein the respective representation of 
each known biological fragment is a probabihstic template, said template providing a 
probability that a member of a group consistmg of amino acids and nucleotides exists at a 
pre-determined position of said known biological fragment. 

25, (Previously Presented) The method of Claim 24 wherein quantitatively determining a 
score of each known biological fragment in the $et includes for each known biological 
fragment, computing the probability of existence of every subsequence of a pre- 
determined length in the subject genome sequence according to the probabilistic template 
that represents the known biological fragment- 


26. (Previously Presented) Apparatus for analyzing a subject genome sequence, comprising: 

(a) an input device for inputting a subject genome sequence; 

(b) a data store of representations of a set of a predefined nmnber of known 
biological fragments; and 

(c) a scoring routine executed by a digital processor having access to the data 
store, the scoring routine quantitatively determining a score of each knovm 
biological fragment in the set as compared against the subject genome 
sequence, said scores forming a feature vector having a length equal to the 
predefmed number of known biological sequences; and 

(d) an analyzing routine executed by a digital processor, the analyzing routine 
using the feature vector to produce classification, indexing or clustering of 
the subject genome sequence. 
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27. (Previoiisly Presented) The apparatus of Claim 26 wherein each known biological 
fragraem in the set is represented by a respective tejct string. 


28. (Previously Presented) The apparatus of Claim 27 wherein the scoring routine includes 
for each known biological fragment, counting the number of times the respective text 
string 15 found within the subject genome sequence. 


29, (Previously Presented) The apparatus of Claim 26 wherein each known biological 
fragment in the set is represented by a probabilistic template, said template providing a 
probability thai a member of a group consisting of amino acids and nucleotides exists at a 
pre-determined position of said known biological fragment. 

30, (Previously Presented) The apparams of Claim 29 wherein the scoring routine includes 
for each known biological fragment, computing the probability of existence of every 
subsequence of a pre-determined length in the subject genome sequeoce according to the 
probabilistic template that represents the known biological fragment, 

31, (New) A method of assigning a subject genome sequence to a class, comprising: 

(a) providing a set of known biological fragments, the set being of a fixed 
number of said known biological fragments, each known biological 
fragment in the set having a respective representation; 

(b) providing ax least one training sequences; 

(c) for each known biological fragment, quantitatively determining a score 
with respect to each training sequence; 

(d) for each training sequence, forming a training feature vector, said training 
feature vector being a sequence of scores of each known biological 
fragment with respect to the training sequence; 

(e) using the naining feature vectors, classifying the trainii^ sequences, 
thereby defining classes of sequences; 

(f) providing a subject genome sequence; 
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qu4atimively deiermining a score of each known biological fragmem v^th 
respect to the subject genome sequence; 

forming a feature vector of the subjeci genome sequence, said feature 
vector being a sequence of scores of each known biological fir^gniem in 
the set; 

using ihe feature vector and the training feature vectors, ftssigning The 
subject genome sequence to at least one of the defined classes of 
sequences, thereby producing classification, of the subject genome 
sequence. 

32, (New) A method for assigning a subject protein sequence to a class, comprising: 

(a) providing a set of known biological ftagments, ihe set being of a fixed 
number of said known biological fragments, each known biological 
fi-agment in the set having a respective representation; 

(b) comparing the respective representation of each known biological 
fi-agment from the set to a subject protein sequence, for each known 
biological fi^agment said comparing including 

(i) counting the number of times the known biological fragment is 
found in the subject protein sequence; and 

(ii) from said counted number of times, forming a vector element, 
such that for each known biological fragment there is a respective 
vector element representing the number of times the respective 
representation of that known biological fragment is found in the 
subject protein sequence; 

(c) from the vector elements, forming a feature vector having a length equal 
TO rhe fixed number of known biological fragments in the provided set* the 
feature vector providing a uniform representation of the subject proiein 
sequence; and 


(g) 
(b) 

(e) 
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(d) analyzing ihe subject protein sequence using ibe feature vector as input 
and classifying the subject protein sequence into a structural or a 
functional class, 

wherein classifying the subject protein sequence includes; 

(i) providing at least one training sequences; 

(ii) for each known biological firagment, quantitatively detennining a 
score with respect to each training sequence; 

(iii) for each training sequence, forming a training teaiure vector, said 
training feature vector being a sequence of scores of each known 
biological fragment with respect to the training sequence; 

(iv) tising the training feature vectors, classifying the training 
sequences, thereby defining classes of sequences; 

(v) osing the feature vector and the training feature vectors, assigning 
the subject protein sequence to at least one of the defined classes of 
sequences, thereby classifying the subject protein sequence into a 
structural or a functional class. 


33. (New) The apparatus of Claim 26 wherein the analysing routine includes: 

(i) providing at lest one training sequences; 

(ii) for each Icnown biological fragment, quantitatively determining a score 
with respect to each training seqiience; 

(iii) for each training sequence, forming a training feature vector, said training 
feature vector being a sequence of scores of each known biological 
fragment with respect to the training sequence; 

(iv) using the n^ining feature vectors, classifying the training sequences, 
thereby defining classes of sequences; 

(v) using the feature vector and the training feature vectors, assigning the 
subject protein sequence to at least one of the defined classes of 
sequences, thereby analyzing the subject protein sequence. 
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